Skip to content

Conversation

@martijnvg
Copy link
Member

Backporting #113757 to 8.x branch.

The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.

For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]

Empty arrays are also supported by encoding a zigzag vint array of zero elements.

Limitations:

currently only doc values based array support for keyword field mapper. multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c] arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"]. These limitations can be addressed, but some require more complexity and or additional storage.

With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).

… source

Backporting elastic#113757 to 8.x branch.

The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.

For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]

Empty arrays are also supported by encoding a zigzag vint array of zero elements.

Limitations:

currently only doc values based array support for keyword field mapper.
multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].
These limitations can be addressed, but some require more complexity and or additional storage.

With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).
@martijnvg martijnvg added backport auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :StorageEngine/Mapping The storage related side of mappings v8.19.0 labels Feb 20, 2025
@elasticsearchmachine elasticsearchmachine merged commit c805201 into elastic:8.x Feb 20, 2025
15 checks passed
@martijnvg martijnvg deleted the backport_113757 branch February 20, 2025 09:50
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 20, 2025
…hetic source

Update index version check after elastic#122997 backport was merged.
elasticsearchmachine pushed a commit that referenced this pull request Feb 20, 2025
…hetic source (#123006)

Update index version check after #122997 backport was merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :StorageEngine/Mapping The storage related side of mappings v8.19.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants